144 research outputs found

    Evaluation of Handwriting Similarities Using Hermite Transform

    Get PDF
    http://www.suvisoft.comIn this paper, we present a new method for handwriting documents denoising and indexing. This work is based on the Hermite Transform, which is a polynomial transform and a good model of the human visual system (HVS). We use this transformation to analyze handwritings using their visual aspect of texture. We apply this analysis to document indexing (finding documents coming from the same author) or document classification (grouping document containing handwritings that have similar visual aspect). It is often necessary to clean these documents before the analyze step. For that purpose, we use also the Hermite decomposition. The current results are very promising and show that it is possible to characterize handwritten drawings without any a priori graphemes segmentation

    A new approach for centerline extraction in handwritten strokes: an application to the constitution of a code book

    Get PDF
    International audienceWe present in this paper a new method of analysis and decomposition of handwritten documents into glyphs (graphemes) and their associated code book. The different techniques that are involved in this paper are inspired by image processing methods in a large sense and mathematical models implying graph coloring. Our approaches provide firstly a rapid and detailed characterization of handwritten shapes based on dynamic tracking of the handwriting (curvature, thickness, direction, etc.) and also a very efficient analysis method for the categorization of basic shapes (graphemes). The tools that we have produced enable paleographers to study quickly and more accurately a large volume of manuscripts and to extract a large number of characteristics that are specific to an individual or an era

    Analyse d'images de documents anciens : Catégorisation de contenus par approche texture

    Get PDF
    Nous proposons une caractérisation du contenu des ouvrages anciens basée sur une approche texture non paramétrique. Cette démarche se veut générique et adaptable à tout type d'ouvrages en s'appuyant sur l'homogénéité des textures que l'on retrouve dans un ouvrage. En appliquant à plusieurs résolutions 5 algorithmes d'extractions de textures il est possible de caractériser le contenu des pages d'un ouvrage. Cette méthode est appliquée sur des pages d'ouvrages anciens du 16ème siècle

    D'une pondération automatique des caractéristiques des graphèmes à la création des CodeBooks, un nouveau point de vue dédié aux applications CBIR

    Get PDF
    Session "Posters"National audienceNous présentons dans cet article un nouveau mécanisme de construction des codebooks à partir des graphèmes issus de la décomposition de l'écriture manuscrite. Ces derniers sont importants pour simplifier ultérieurement l'automatisation de l'analyse, de la transcription de ces manuscrits et de la reconnaissance de styles ou de scripteurs. Notre approche apporte d'une part une sélection précise des descripteurs de graphèmes par algorithmes génétiques et d'autre part une méthodologie performante pour la catégorisation de la forme des graphèmes en utilisant la coloration de graphes. Nous montrons en quoi le couplage de ces deux mécanismes " sélection-classification " permet d'offrir une meilleure séparation des formes à catégoriser en exploitant leurs particularités grapho-morphologiques, leurs densités et leurs orientations significative

    Multi One-Class Incremental SVM for Document Stream Digitization

    Get PDF
    International audienceInside the DIGIDOC project (ANR-10-CORD-0020)-CONTenus et INTeractions (CONTINT), our approach was applied to several scenarios of classification of image streams which can cores ond to real cases in digitization projects. Most of the time, the processing of documents is considered as a well-defined task: the classes (also called concepts) are defined and known before the processing starts. But in real industrial workflows of document processes, it may frequently happen that the concepts can change during the time. In a context of document stream processing, the information and content included in the digitized pages can evolve over the time as well as the judgment of the user on what he wants to do with the resulting classification. The goal of this application is to create a module of learning, for a steam-based document images classification (especially dedicated to a digitization process with a huge volume of data), that adapts different situations for intelligent scanning tasks: adding, extending, contracting, splitting, or merging the classes in on an online mode of streaming data processing

    Text lines and snippets extraction for 19th century handwriting documents<br /> layout analysis

    Get PDF
    International audienceIn this paper we propose a new approach to improve electronic editions of human science corpus, providing an efficient estimation of manuscripts pages structure. In any handwriting documents analysis process, the text line segmentation is an important stage. The presence of variable inter-line spaces, of inconstant base-line skews, overlapping and occlusions in unconstrained ancient 19th handwritten documents complexifies the text lines segmentation task. In this paper, we only use as prior knowledge of script the fact that text lines skews can be random and irregular. In that context, we model text line detection as an image segmentation problem by enhancing text line structure using Hough transform and a clustering of connected components so as to make text line boundaries appear. The proposed approach of snippets decomposition for page layout analysis lies on a first step of content pages classification in five visual and genetic taxonomies, and a second step of text line extraction and snippets decomposition. Experiments show that the proposed method achieves high accuracy for detecting text lines in regular and semi-regular handwritten pages in the corpus of digitized Flaubert manuscripts ("Dossiers documentaires de Bouvard et Pécuchet", 1872-1880)

    Hierarchical decomposition of handwritten<br /> manuscripts layouts

    Get PDF
    http://www.springerlink.com/content/k6741wt1028l7310/International audienceIn this paper we propose a new approach to improve electronic editions of literary corpus, providing an efficient estimation of manuscripts pages structure. In any handwriting documents analysis process, structure recognition is an important issue. The presence of variable inter-line spaces, of inconstant base-line skews, overlappings and occlusions in unconstrained ancient 19th handwritten documents complicates the structure recognition task. Text line and fragment extraction is basedon the connexity labelling of the adjacency graph at different resolutionlevels, for borders, lines and fragments extraction

    Contributions au tri automatique de documents et de courrier d'entreprises

    Get PDF
    Ce travail de thèse s inscrit dans le cadre du développement de systèmes de vision industrielle pour le tri automatique de documents et de courriers d entreprises. Les architectures existantes, dont nous avons balayé les spécificités dans les trois premiers chapitres de la thèse, présentent des faiblesses qui se traduisent par des erreurs de lecture et des rejets que l on impute encore trop souvent aux OCR. Or, les étapes responsables de ces rejets et de ces erreurs de lecture sont les premières à intervenir dans le processus. Nous avons ainsi choisi de porter notre contribution sur les aspects inhérents à la segmentation des images de courriers et la localisation de leurs régions d intérêt en investissant une nouvelle approche pyramidale de modélisation par coloration hiérarchique de graphes ; à ce jour, la coloration de graphes n a jamais été exploitée dans un tel contexte. Elle intervient dans notre contribution à toutes les étapes d analyse de la structure des documents ainsi que dans la prise de décision pour la reconnaissance (reconnaissance de la nature du document à traiter et reconnaissance du bloc adresse). Notre architecture a été conçue pour réaliser essentiellement les étapes d analyse de structures et de reconnaissance en garantissant une réelle coopération entres les différents modules d analyse et de décision. Elle s articule autour de trois grandes parties : une partie de segmentation bas niveau (binarisation et recherche de connexités), une partie d extraction de la structure physique par coloration hiérarchique de graphe et une partie de localisation de blocs adresse et de classification de documents. Les algorithmes impliqués dans le système ont été conçus pour leur rapidité d exécution (en adéquation avec les contraintes de temps réels), leur robustesse, et leur compatibilité. Les expérimentations réalisées dans ce contexte sont très encourageantes et offrent également de nouvelles perspectives à une plus grande diversité d images de documents.This thesis deals with the development of industrial vision systems for automatic business documents and mail sorting. These systems need very high processing time, accuracy and precision of results. The current systems are most of time made of sequential modules needing fast and efficient algorithms throughout the processing line: from low to high level stages of analysis and content recognition. The existing architectures that we have described in the three first chapters of the thesis have shown their weaknesses that are expressed by reading errors and OCR rejections. The modules that are responsible of these rejections and reading errors are mostly the first to occur in the processes of image segmentation and interest regions location. Indeed, theses two processes, involving each other, are fundamental for the system performances and the efficiency of the automatic sorting lines. In this thesis, we have chosen to focus on different sides of mail images segmentation and of relevant zones (as address block) location. We have chosen to develop a model based on a new pyramidal approach using a hierarchical graph coloring. As for now, graph coloring has never been exploited in such context. It has been introduced in our contribution at every stage of document layout analysis for the recognition and decision tasks (kind of document or address block recognition). The recognition stage is made about a training process with a unique model of graph b-coloring. Our architecture is basically designed to guarantee a good cooperation bewtween the different modules of decision and analysis for the layout analysis and the recognition stages. It is composed of three main sections: the low-level segmentation (binarisation and connected component labeling), the physical layout extraction by hierarchical graph coloring and the address block location and document sorting. The algorithms involved in the system have been designed for their execution speed (matching with real time constraints), their robustness, and their compatibility. The experimentations made in this context are very encouraging and lead to investigate a wider diversity of document images.VILLEURBANNE-DOC'INSA-Bib. elec. (692669901) / SudocSudocFranceF

    Old document image analysis : a texture approach

    Get PDF
    In this article, we propose a method of characterization of images of old documents based on a texture approach. This characterization is carried out with the help of a multi-resolution study of the textures contained in the images of the document. Thus, by extracting five features linked to the frequencies and to the orientations in the different areas of a page, it is possible to extract and compare elements of high semantic level without expressing any hypothesis about the physical or logical structure of the analysed documents. Experimentations demonstrate the performance of our propositions and the advances that they represent in terms of characterization of content of a deeply heterogeneous corpus.Dans cet article, nous proposons une méthode de caractérisation d’images d’ouvrages anciens basée sur une approche texture. Cette caractérisation est réalisée à l’aide d’une étude multirésolution des textures contenues dans les images de documents. Ainsi, en extrayant cinq indices liés aux fréquences et aux orientations dans les différentes parties d’une page, il est possible d’extraire et de comparer des éléments de haut niveau sémantique sans émettre d’hypothèses sur la structure physique ou logique des documents analysés. Des expérimentations montrent la faisabilité de la réalisation d’outils d’aide à la navigation ou d’aide à l’indexation. Au travers de ces expérimentations, nous mettrons en avant la pertinence de ces indices et les avancées qu’ils représentent en terme de caractérisation de contenu d’un corpus fortement hétérogène
    • …
    corecore